Final Project of the Udacity Nanodegree Data Scientist Program
For the final project of the Udacity Nanodegree Data Scientist Program I needed to decide on what kind of data I will work on, applying the acquired skills.
Since I did not want to put too much effort in analyzing a data set that I didn't need in the near future even after completing the Nanodegree (wether at home or at work), I chose financial data. A year ago after my first Udacity Course (Data Analyst), I already wrote a program that scrapes, transforms, visualizes and stores my personal financial data from pdf-documents (statements of earnings, statements of bank accounts and insurance data) in order to monitor the development over time, identify trends and keep track on the variety of information. For obvious confidential reasons those are not going to be part of the current project.
At the same moment I started for the first time of my life investing in stocks, funds and ETFs (which seemingly everybody did during the Covid-19 pandemic). Well knowing that I am not going to be the next Warren Buffett, I still enjoy studying in a field that I hadn't entered before and that will for sure affect me for the rest of my life. This is exactly why I want to use financial stock data for the underlying data science project with python. So far, timeseries data hasn't yet been the focus of the classes, so that I additionally needed to do some research (amongst others with the help of the free Udaciy courses "Timeseries Forecasting" and "Machine Learning for Trading").
The project requirements/steps are the following:
Project Idea/Plan:
Please note, that all insights, data, findings and predictions that I make throughout the project shall not be used as basis to trade stocks in any way! There might be mistakes, incomplete analyses and biased conclusions. No Guarantees!
As I mentioned before, I am currently holding stocks and ETFs and plan on eventually acquiring more in the future (without risking much). This being said, the following questions pop up:
All of the above questions are relevant for a further question: "In which stocks or markets should I invest in the future?".
The majority of the stated questions/tasks can be exemplarily assessed with the historical data and meta data of one single stock. However I plan on working with functions so that the upcoming algorithms can be applied to almost any kind of stock or market.
Being aware that there has already been masses of similar projects and researches, I will try to use as many useful existing APIs, Modules and Strategies to get everything to work and save brain capacity (feel free to checkout the credits and sources at the end of the notebook).
I tried 3 common (and mostly free) Python APIs to gather historical stock data with. Quandl for example seems to be well known but I had some issues navigating through their data bases and finding the stock data that I wanted. YFinance (Yahoo Finance) comes in quite handy but nevertheless I decided to use Alpha Vantage (limited to 5 requests per minute and 500 per day) which delivers a lot of data in a comfortable way. Take into account that you have to get a free API key first (same accounts for Quandl).
About the time intervals: Having a regular job, I can't react on price movement within hours or even minutes, which is why I am totally fine with the historical data on a daily basis.
# imports
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.io as pio
import plotly as py
import datetime
import random
import re
import requests
import time
import yfinance as yf
import ta
import pandas_ta
import smtplib
import json
import tensorflow as tf
from io import BytesIO
from functools import reduce
from pandas.tseries.offsets import DateOffset
from chart_studio import plotly
from plotly.offline import iplot
from plotly.subplots import make_subplots
from email.message import EmailMessage
from keras.preprocessing.sequence import TimeseriesGenerator
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.metrics import SCORERS
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from scipy import signal
from IPython.core.display import display, HTML
from tqdm.notebook import tqdm
from alpha_vantage.timeseries import TimeSeries
from alpha_vantage.fundamentaldata import FundamentalData
from alpha_vantage.techindicators import TechIndicators
from alpha_vantage.sectorperformance import SectorPerformances
from scipy.signal import argrelextrema
from statsmodels.tsa.seasonal import seasonal_decompose
%matplotlib inline
#widen notebook
display(HTML("<style>.container { width:90% !important; }</style>")) # increase display width of notebook
# enable html export with working plotly plots?!
pio.renderers.default = "notebook"
# set displaying options for pandas and matplotlib
pd.set_option("display.float_format", lambda x: "%.5f" % x)
pd.set_option('display.max_colwidth', 500)
pd.set_option('display.max_columns', 100)
# pd.set_option("display.max_rows", None, "display.max_columns", None)
plt.rcParams['figure.figsize'] = [8, 6]
plt.rcParams['figure.dpi'] = 100 # 200 e.g. is really fine, but slower
#jtplot.style(theme="grade3", context="notebook", ticks=True, grid=False)
As described above, there are several (free) sources of stock data and I will mostly use the Alpha Vantage API, which provides theoretically the following data:
Unfortunately there are still lots of stocks where not all data is present or even up to date (especially of stocks out of US markets) but at least the historic data is often provided. The following code contains my functions to quickly gather the described data if available. For the project revision I will provide the data as csv-files since I am not going to hand over my API keys.
# choose a stock to work on (default = BMW.DE, other suggestions: AAPL, MSFT, GOOG)
# symbol = str(input("Type the symbol of the stock that you want to analyze: " or "BMW.DE"))
symbol="BMW.DE"
For now the BMW stock will serve as an example, even though the code is intended to work for all stocks that are available at Alpha Vantage.
# get historical stock data from alpha vantage
def get_stockHistory(symbol):
"""
input:
symbol (str): String containing the exact ticker symbol of the stock of interest
output:
df (DataFrame): dataframe containing the stocks history data (open, high, low, close, close adj, volume, div, split)
meta_data (dict): dictionary containing meta data about the stock of interest
"""
# read api key from .txt
with open('api_keys/alpha_vantage.txt') as f:
api_key = f.readlines()[0]
# make alpha_vantage api requests
ts = TimeSeries(key=api_key, output_format="pandas")
df, meta_data = ts.get_daily_adjusted(symbol=symbol, outputsize="full") # get stock history data
df.columns = ["open", "high", "low", "close", "close_adj", "volume", "div", "split"] # rename columns
return df, meta_data
# in case of missing data or limited requests with alpha vantage, i'll provide as well a function to request the data from yfinance
def get_stockHistory_YF(symbol):
"""
input:
symbol (str): String containing the exact ticker symbol of the stock of interest
output:
df (DataFrame): dataframe containing the stocks history data (open, high, low, close, close adj, volume, div, split)
"""
obj = yf.Ticker(symbol)
df = obj.history(period="max")
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# rename columns
df.columns = ["open", "high", "low", "close", "volume", "div", "split"]
# rename index
df.index.names = ['date']
return df
# get company data from alpha vantage
def get_company_overview(symbol):
"""
input:
symbol (str): String containing the exact ticker symbol of the stock of interest
output:
df (DataFrame): dataframe containing the stocks company data
"""
# read api key from .txt
with open('api_keys/alpha_vantage.txt') as f:
api_key = f.readlines()[0]
# make alpha_vantage api request
fd = FundamentalData(key=api_key, output_format="pandas")
df = fd.get_company_overview(symbol=symbol)
df = df[0]
return df
# show fundamental stock data if available
try:
stockCompOver = get_company_overview(symbol)
display(stockCompOver.T)
except Exception as e:
print("Error for symbol '{}': {}".format(symbol, e))
Error for symbol 'BMW.DE': Error getting data from the api, no return was given.
Unfortunately there is no company overview data for the symbol "BMW.DE"
# get dates of quarterly reports from alpha vantage if available
def get_earnings_calendar(horizon, symbol):
"""
input:
horizon (str): timeperiod over which the dates of the companies earnings communication / quarterly reports are being published. Either "3month", "6month", or "12month".
symbol (str): String containing the exact ticker symbol of the stock of interest
output:
df (DataFrame): dataframe containing the dates of the companies earnings communication / quarterly reports
"""
BASE_URL = r"https://www.alphavantage.co/query?"
with open('api_keys/alpha_vantage.txt') as f:
api_key = f.readlines()[0]
url = f'{BASE_URL}function=EARNINGS_CALENDAR&symbol={symbol}&horizon={horizon}&apikey={api_key}'
response = requests.get(url)
df = pd.read_csv(BytesIO(response.content))
return df
try:
stockEarnCal = get_earnings_calendar("12month", symbol)
except Exception as e:
print("Error for symbol '{}': {}".format(symbol, e))
# show dates of quarterly earnings if available
stockEarnCal
| symbol | name | reportDate | fiscalDateEnding | estimate | currency |
|---|
Unfortunately there is no earnings calendar data for the symbol "BMW.DE"
# combined api request outputting as well technical indicators and eventually fundamental data if available
def get_stockData(symbol, hist=True, techInd=False, earnCal=False, compOver=False, secPerf=False):
"""
input:
symbol (str): String containing the exact ticker symbol of the stock of interest
hist, techInd, earnCal, compOver, secPerf (bool): get historic/technical indicator/earnings calendar/company overview/section performance data if True
output:
df_x (DataFrame): dataframes containing the stocks data asked for with the input labels
"""
# create dataframe shells
df_ts = pd.DataFrame()
df_ti = pd.DataFrame()
df_sma = pd.DataFrame()
df_ema = pd.DataFrame()
df_rsi = pd.DataFrame()
df_adx = pd.DataFrame()
df_mom = pd.DataFrame()
df_bb = pd.DataFrame()
df_ec = pd.DataFrame()
df_co = pd.DataFrame()
df_sp = pd.DataFrame()
# read api key from .txt
with open('api_keys/alpha_vantage.txt') as f:
api_key = f.readlines()[0]
# make alpha_vantage api requests
ts = TimeSeries(key=api_key, output_format="pandas")
ti = TechIndicators(key=api_key, output_format="pandas")
fd = FundamentalData(key=api_key, output_format="pandas")
sp = SectorPerformances(key=api_key, output_format="pandas")
# historical data
if hist:
df_ts, _ = ts.get_daily_adjusted(symbol=symbol, outputsize="full") # get stock history data
df_ts.columns = ["open", "high", "low", "close", "close_adj", "volume", "div", "split"] # rename columns
# technical indicators
if techInd:
df_sma, _ = ti.get_sma(symbol=symbol, interval='daily', time_period=60, series_type="close") # get sma
df_ema, _ = ti.get_ema(symbol=symbol, interval='daily', time_period=60, series_type="close") # get ema
df_rsi, _ = ti.get_rsi(symbol=symbol, interval='daily', time_period=60, series_type="close") # get rsi
# df_adx, _ = ti.get_adx(symbol=symbol, interval='daily', time_period=60) # get adx (ignored in order not to exceed 5 requests per minute)
# df_mom, _ = ti.get_mom(symbol=symbol, interval='daily', time_period=60, series_type="close") # get mom (ignored in order not to exceed 5 requests per minute)
df_bb, _ = ti.get_bbands(symbol=symbol, interval='daily', time_period=60, series_type="close") # get bbands
df_bb.columns = ["BBmi", "BBlo", "BBup"]
# earnings calendar
if earnCal:
df_ec, _ = fd.get_earnings_calendar(symbol=symbol, horizon="12month")
# company overview
if compOver:
df_co = fd.get_company_overview(symbol=symbol)
df_co = df_co[0]
# sector performance info
if secPerf:
df_sp, _ = sp.get_sector() # get sector
# merge historical data with indicators
dfs = [df_ts, df_sma, df_ema, df_rsi, df_adx, df_mom, df_bb]
df_comp = reduce(lambda left, right: pd.merge(left, right, how="outer", left_index=True, right_index=True), dfs)
return df_comp, df_ts, df_ec, df_co, df_sp
# stockHist_comp, stockHist, _, _, _ = get_stockData(symbol, hist=True, techInd=False, earnCal=False, compOver=False, secPerf=False)
# stockHist_comp
# creating csv containing stock history and txt containing meta data for testing without api requests
# stockHist.to_csv("data/datasets/stockHist.csv")
# json.dump(stockMeta, open("data/datasets/stockMeta.txt",'w'))
# # reading stock history from csv (for other users without api keys)
stockHist = pd.read_csv("data/datasets/stockHist.csv", index_col="date", parse_dates=True)
# # reading meta data from txt
# stockMeta = json.load(open("data/datasets/stockMeta.txt"))
stockHist
| open | high | low | close | close_adj | volume | div | split | |
|---|---|---|---|---|---|---|---|---|
| date | ||||||||
| 2021-12-16 | 89.75000 | 90.31000 | 89.27000 | 89.64000 | 89.64000 | 1364574.00000 | 0.00000 | 1.00000 |
| 2021-12-15 | 88.89000 | 89.41000 | 88.23000 | 88.27000 | 88.27000 | 794212.00000 | 0.00000 | 1.00000 |
| 2021-12-14 | 89.98000 | 90.04000 | 88.22000 | 88.40000 | 88.40000 | 1116145.00000 | 0.00000 | 1.00000 |
| 2021-12-13 | 89.80000 | 91.88000 | 89.55000 | 89.88000 | 89.88000 | 1086537.00000 | 0.00000 | 1.00000 |
| 2021-12-10 | 89.53000 | 90.19000 | 88.95000 | 89.66000 | 89.66000 | 1415043.00000 | 0.00000 | 1.00000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2005-01-07 | 34.69000 | 34.73000 | 34.31000 | 34.60000 | 19.81840 | 1864405.00000 | 0.00000 | 1.00000 |
| 2005-01-06 | 34.44000 | 34.91000 | 34.43000 | 34.71000 | 19.88140 | 2130931.00000 | 0.00000 | 1.00000 |
| 2005-01-05 | 34.22000 | 34.69000 | 34.05000 | 34.54000 | 19.78400 | 3314502.00000 | 0.00000 | 1.00000 |
| 2005-01-04 | 33.60000 | 34.52000 | 33.60000 | 34.42000 | 19.71530 | 3613994.00000 | 0.00000 | 1.00000 |
| 2005-01-03 | 33.41000 | 33.85000 | 33.40000 | 33.75000 | 19.33150 | 1742708.00000 | 0.00000 | 1.00000 |
4306 rows × 8 columns
stockHist.describe()
| open | high | low | close | close_adj | volume | div | split | |
|---|---|---|---|---|---|---|---|---|
| count | 4306.00000 | 4306.00000 | 4306.00000 | 4306.00000 | 4306.00000 | 4306.00000 | 4306.00000 | 4306.00000 |
| mean | 62.41084 | 63.11240 | 61.64346 | 62.39356 | 48.25319 | 2427249.84742 | 0.00785 | 1.00000 |
| std | 22.29050 | 22.43170 | 22.08830 | 22.26896 | 21.81710 | 1488484.45383 | 0.14565 | 0.00000 |
| min | 17.28000 | 17.81500 | 16.00000 | 17.04000 | 10.98350 | 0.00000 | 0.00000 | 1.00000 |
| 25% | 41.46250 | 41.95000 | 40.85250 | 41.40125 | 25.37920 | 1462853.75000 | 0.00000 | 1.00000 |
| 50% | 64.68500 | 65.26500 | 63.94500 | 64.63500 | 50.59135 | 2063199.00000 | 0.00000 | 1.00000 |
| 75% | 81.75000 | 82.54000 | 80.73750 | 81.59750 | 67.05378 | 2895719.25000 | 0.00000 | 1.00000 |
| max | 123.30000 | 123.75000 | 120.35000 | 122.60000 | 95.89000 | 17588760.00000 | 4.00000 | 1.00000 |
stockHist.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 4306 entries, 2021-12-16 to 2005-01-03 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 open 4306 non-null float64 1 high 4306 non-null float64 2 low 4306 non-null float64 3 close 4306 non-null float64 4 close_adj 4306 non-null float64 5 volume 4306 non-null float64 6 div 4306 non-null float64 7 split 4306 non-null float64 dtypes: float64(8) memory usage: 302.8 KB
Fortunately by using historic stock data from the Alpha Vantage API, there is no big data wrangling necessary. There might be lots more work if the plan was to scrape fundamental data for each timestep in the past in order to improve the analysis. This won't be a part of this notebook though.
When I started comparing the stock data from Alpha Vantage to the diagrams that I found on different Broker websites I was irritated by differences in value which I later found out to be resulting in the distinction between adjusted and not adjusted close values. The adjusted close values are calculated with respect to dividends, splits and new offerings. Since the historical OHLC (Open, High, Low, Close) data relates to the not adjusted close values, I will use the adjusted value only when there is no interaction with the open, high, low or volume data. For stocks with splits in their historical data, I might need a more detailed approach to avoid bias for example in a machine learning model. By the way: The YFinance API doesn't provide adjusted close values, which is one of my reasons to use Alpha Vantage where possible.
# plot close and adjusted close prices of the entire stock history as well as split and dividends information
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.update_layout(xaxis_title="", yaxis_title="Price", template="plotly_white")
fig.update_yaxes(title_text="Dividends", secondary_y=True)
fig.add_trace(go.Scatter(x=stockHist.index, y=stockHist.close, mode="lines", name="close"))
fig.add_trace(go.Scatter(x=stockHist.index, y=stockHist.close_adj, mode="lines", name="close_adj"))
fig.add_trace(go.Bar(x=stockHist.index, y=stockHist["div"], name="dividends"), secondary_y=True)
fig.update_traces(marker_color='white', marker_line_color='darkgreen',
marker_line_width=2, width=1000 * 3600 * 24 * 31, opacity=0.6, secondary_y=True)
try:
fig.add_vline(x=stockHist[stockHist.split!=1].split, line_color="orange", line_width=1, line_dash="dash")
except Exception as e:
print("Error in Split-Data: "+str(e)+" -> Probably no splits found in historic data:")
# check if there were any splits in the stocks history (values other than 1):
print("Unique split values: ", stockHist.split.unique())
fig['layout']['yaxis2']['showgrid'] = False
fig.show()
fig.write_html("data/results/reports/close_plot.html")
Error in Split-Data: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). -> Probably no splits found in historic data: Unique split values: [1.]
Of course I would be naive thinking I could easily predict future stock movements with the basic knowledge that I have, but still... You have to start somewhere, right? Let's not rush things by trying to predict the prices for the next month but rather investigate the general ideas to understand and estimate price behaviour, then put the criteria in some scalable measures and finally try to at least assume an up or down trend a few days in the future.
Usually the analysis of a stock is divided into the fundamental analysis and the technical analysis. For a quick understanding of the underlying differences i found the following page to be helpful:
The fundamental analysis can be understood as "looking at aspects of a company in order to estimate its value". This can be an analysis of the companies general condition, its decisions and communications but it can also be the analysis of "outside influences" like the current pandemic, or even tweets of famous people about the company or its market. Fundamental analysis can be a matter of politics, nature catastrophes and more and is especially used for long term investments. Since the focus of this project is rather about short term decision making, I will no further discuss the fundamental analysis but rather the more interesting analysis for a programmatic approach: the technical analysis.
The technical analysis deals with the quantification of the company's stock performance usually rather in a short or mid term timespan. There is a huge amount of so called indicators that are calculated from available stock data that traders use to estimate current price movements. Those indicators can also be visual patterns in financial charts like a "candlestick diagram" or other strategies like seasonal decomposition (although I think that this technique is preferably applied on rather less unsteady timeseries like yearly sales etc.). Since this project is more about the programmatic approach and less about the financial background, I will introduce just a few indicators for basic use cases.
All upcoming features/indicators will be stored in a copy of the stockHist data frame as "stockHist_comp" (comp = "complex")
try:
stockHist_comp # might have been already created with the get_stockData function
except NameError:
stockHist_comp = stockHist.copy()
With the help of the seasonal_decompose function of statsmodels, we can determine trend and seasonal patterns in the historic data. Assuming that we need a multiplicative decomposition, the components will look as in the following charts (period is set to 252 working days - approximately one year of data). The trend as its name already indicates shows the general movement of the stock price, whilst the seasonal component in the chosen timeperiod shows a seasonal/cyclic pattern indicating price movement happening each year. The residual component basically reflects uncertainty, stock volatility and unforseeable changes in price (e.g. Covid-19 impact at the beginning of 2020).
# use statsmodels seasonal_decompose to find seasonal/cyclic pattern in the historic data
decompose_result_mult = seasonal_decompose(stockHist.close_adj.iloc[::-1], model="multiplicative", period=252) # , extrapolate_trend='freq'?
stockHist_comp["trend"] = decompose_result_mult.trend
stockHist_comp["seasonal"] = decompose_result_mult.seasonal
stockHist_comp["residual"] = decompose_result_mult.resid
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.5, 0.5])
fig.update_layout(title="Seasonal Decomposition", template="plotly_white")
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj, name="close_adj"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.trend, name="trend"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.seasonal, name="seasonal"), row=2, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.residual, name="residual"), row=2, col=1)
fig.show()
fig.write_html("data/results/reports/seasonal_decomposition_plot.html")
If you hide the residual by clicking on it, you will see the seasonal pattern that seems to occur each year. This could be provoked by the communication of quarterly reports (orange dashed lines in the next chart), seasonal client buying behaviour, distribution of dividends (green dashed lines in the next chart) or the possibility to buy "preference shares" (for BMW at the beginning of November).
def compare_years(df, feature, num_years=3):
"""
input:
df (DataFrame): data frame containing the features that shall be compared for each year
feature (string): name of the feature that shall be compared for each year
num_years (int): number of years in the past that shall be compared
output:
fig (plotly figure object): saved as html
"""
current_year = datetime.datetime.now().year
years = []
for i in range(num_years+1):
years.append(current_year-i)
years = years[::-1]
plot_layout = go.Layout(title="Comparison of {} in the last years".format(feature))
fig = go.Figure(layout=plot_layout)
fig.update_layout(xaxis_title="Workday", yaxis_title=feature, template="plotly_white")
quarters = [df.loc[str(years[-2])].shape[0]/4, df.loc[str(years[-2])].shape[0]/2, df.loc[str(years[-2])].shape[0]/4*3]
for quarter in quarters: fig.add_vline(x=quarter, line_width=2, line_dash="dash", line_color="orange")
for year in years:
df_temp = df.loc[str(year)].sort_values(by="date", ascending=True)
df_temp.reset_index(inplace=True)
fig.add_traces(go.Scatter(x=df_temp.index, y=df_temp[feature]/df_temp[feature], mode="lines",
name="Horizontal", showlegend=False))
fig.add_traces(go.Scatter(x=df_temp.index, y=df_temp[feature], mode='lines',
name=feature+" "+str(year), fill="tonexty"))
for placeholder in df_temp.loc[df_temp["div"]!=0].index.values:
fig.add_vline(x=placeholder, line_width=2, line_dash="dash", line_color="lightgreen")
fig.show()
fig.write_html("data/results/reports/{}_plot.html".format(feature))
compare_years(stockHist_comp, "seasonal", num_years=4)
compare_years(stockHist_comp, "residual", num_years=4)
The comparison of the residuals over the years can clearly show the impact of the pandemic start in march 2020 when all markets decreased significantly.
def seasonal_forecast(df):
"""
input:
df (DataFrame): data frame containing the seasonal component features and close_adj values
output:
fig (plotly figure object): 2 diagrams that show the seasonal pattern and a possible future predicted with it
"""
feature = "seasonal"
current_year = datetime.datetime.now().year
plot_layout = go.Layout(
title="{} in the current year compared to the last year".format(feature)
)
fig = go.Figure(layout=plot_layout)
fig.update_layout(xaxis_title="Workday", yaxis_title=feature, template="plotly_white")
df_current_year = df.loc[str(current_year)].sort_values(by="date", ascending=True)
df_current_year.reset_index(inplace=True)
df_current_year["close_adj_mean"] = df_current_year.close_adj.mean()
df_last_year = df.loc[str(current_year-1)].sort_values(by="date", ascending=True)
df_last_year.reset_index(inplace=True)
# try to find the x-offset to synchronize the seasonal pattern
lag = []
x = df_last_year[feature].to_numpy(na_value=0)
y = df_current_year[feature].to_numpy(na_value=0)
correlation = signal.correlate(x, y, mode="full")
lags = signal.correlation_lags(x.size, y.size, mode="full")
lag = lags[np.argmax(correlation)]
df_current_year = df_current_year.shift(lag) # somehow the estimated lag doesn't always fit
df_current_year.dropna(subset=["date"], inplace=True)
fig.add_traces(go.Scatter(x=df_last_year.index, y=df_last_year[feature], mode='lines', name=feature+" last year", marker=dict(color="blue")))
fig.add_traces(go.Scatter(x=df_current_year.index, y=df_current_year[feature], mode='lines', name=feature+" current year", marker=dict(color="red")))
fig.show()
fig.write_html("data/results/reports/seasonal_lag_plot.html")
# predict future close_adj with seasonality (holding last price constant)
future = df_last_year.loc[df_last_year.index > df_current_year.index.max(), ["seasonal"]]
offset_correction = future.seasonal.values[0] * df_current_year.close_adj.values[-1] - df_current_year.close_adj.values[-1]
future["close_adj"] = future.seasonal * df_current_year.close_adj.values[-1] - offset_correction # maybe better calculate with continued SMA?!
future.index = pd.bdate_range(start=str(df_current_year.date.dropna().values[-1])[:10], end=str(datetime.datetime.now().year)+"-12-31")[:len(future)]
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.update_layout(
title="Close_adj Prediction with seasonality holding everything constant",
xaxis_title="Workday",
yaxis_title="Close_adj",
template="plotly_white"
)
fig.add_trace(go.Scatter(x=df_current_year.date, y=df_current_year.close_adj, mode="lines",
name="Past Close_adj", marker=dict(color="blue")))
fig.add_trace(go.Scatter(x=future.index, y=future.close_adj, mode="lines",
name="Future Close_adj", marker=dict(color="red")))
fig.add_trace(go.Scatter(x=df_current_year.date, y=df_current_year.trend, mode="lines",
name="Past Trend", marker=dict(color="black"), line_dash="dash"))
fig.add_trace(go.Scatter(x=df_current_year.date, y=df_current_year.close_adj_mean, mode="lines",
name="Close_adj Mean", marker=dict(color="lightblue"), line_dash="dash"), secondary_y=False)
fig.add_trace(go.Scatter(x=df_current_year.date, y=df_current_year.close_adj.mean()*df_current_year.seasonal,
mode="lines", name="Close_adj if trend was constant", fill="tonexty", marker=dict(color="lightblue")), secondary_y=False)
fig.show()
fig.write_html("data/results/reports/seasonal_predict_plot.html")
seasonal_forecast(stockHist_comp)
The lightblue trace (seasonality times mean of this years close_adj) shows that the tendencies are quite similar to the actual price movement (blue). Without regarding the unknown trend, the future close_adj could look like the red line taking ONLY seasonality into account. However, as seen in the charts before, the residual component is mostly outshining the seasonal component. Maybe there are other sectors/stocks with stronger seasonal or cyclic patterns.
# add technical analysis features with ta and or pandas_ta module
# ta.add_all_ta_features(stockHist_comp, open="open", high="high", low="low", close="close", volume="volume")
# stockHist_comp.drop(columns=["trend_psar_up", "trend_psar_down"], inplace=True) # those columns contain too many nans
# calculate SMA and EMA of close_adj
stockHist_comp['SMA'] = stockHist_comp.sort_values(by="date", ascending=True)["close_adj"].rolling(window=20).mean()
stockHist_comp['EMA'] = stockHist_comp.sort_values(by="date", ascending=True)["close_adj"].ewm(span=20).mean()
# difference between close_adj value and its moving average
stockHist_comp["diffCloseSMA"] = stockHist_comp.close_adj - stockHist_comp.SMA
fig = go.Figure()
fig.update_layout(title="Difference of Closing Value and SMA", yaxis_title="Price", template="plotly_white")
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.SMA, name="SMA", line_color="blue"))
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj, name="close_adj", line_color="green", line_width=3))
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj+stockHist_comp.diffCloseSMA, name="close_adj + diffCloseSMA", opacity=.3, line_color="lime", fill="tonexty"))
fig.show()
fig.write_html("data/results/reports/diffCloseSMA_plot.html")
A significant lightgreen area on top of close_adj suggests selling, while the lightgreen areas beneath close_adj suggest buying.
# indicator that shows when the close_adj value crosses the bollinger band (even though more interesing would be the reentry in the bollinger band indicating trend)
stockHist_comp[["BBlow", "BBmid", "BBup", "BBwidth", "BBperc"]] = pandas_ta.bbands(close=stockHist_comp.sort_values(by="date", ascending=True)["close_adj"], length=20)
# RSI indicator (indicates overboughtness/oversoldness)
stockHist_comp["RSI"] = pandas_ta.rsi(close=stockHist_comp.sort_values(by="date", ascending=True)["close_adj"], length=10, append=True)
# create subplots showing Bollinger Bands, SMA and RSI in combination
# BB
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.25, 0.75])
fig.update_layout(title="Bollinger Bands", yaxis_title="Price", template="plotly_white")
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.SMA, name="SMA", line_color="blue"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.BBmid, name="BBmid", line_color="lightblue"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.BBlow, name="BBlow", line_color="black"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.BBup, name="BBup", line_color="black"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj, name="close_adj", line_color="red"), row=1, col=1)
# RSI
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.RSI, name="RSI", line_color="orange"), row=2, col=1)
# Add upper/lower bounds
fig.update_yaxes(range=[-10, 110], row=2, col=1)
fig.add_hline(y=0, col=1, row=2, line_color="#666", line_width=2)
fig.add_hline(y=100, col=1, row=2, line_color="#666", line_width=2)
# Add overbought/oversold
fig.add_hline(y=30, col=1, row=2, line_color='#336699', line_width=2, line_dash='dash')
fig.add_hline(y=70, col=1, row=2, line_color='#336699', line_width=2, line_dash='dash')
fig.show()
fig.write_html("data/results/reports/bb_rsi_plot.html")
Crossings of the Bollinger Bands (BBperc would be negative or above 1) could indicate a soon movement back to the mean (SMA). Sell or buy recommendation would be triggered with re-entry into the bollinger band. The RSI is mostly used in combination with upper and lower limits such as 70%/30% indicating the stock to be overbought/oversold.
# mark local extrema (to maybe predict if the next day is going to be a local minimum or maximum)
def mark_localExtrema(df, col="close", n=5, plot=False):
"""
input:
df (DataFrame): dataframe that includes the timeseries whos extrema shall be found
col (str): name of the dataframe column that contains the timeseries
n (int): number of timesteps to be checked before and after
output:
df (DataFrame): returns the dataframe with two columns (binary values) "minimum" and "maximum"
plot (matplotlib plot): if plot==True, a plot will be displayed with the extrema marked
"""
minName = "{}dayMinimum".format(str(n))
maxName = "{}dayMaximum".format(str(n))
df[minName] = df.iloc[argrelextrema(df[col].values, np.less_equal,
order=n)[0]][col]
df[maxName] = df.iloc[argrelextrema(df[col].values, np.greater_equal,
order=n)[0]][col]
if plot==True:
fig, ax = plt.subplots()
ax.plot(df[col])
ax.plot(df[minName], marker="o", color="green")
ax.plot(df[maxName], marker="o", color="red")
plt.title("{}-Day-Extrema".format(str(n)))
# 1 if day is an extremum, 0 if not
df[minName] = (df[minName]/df[minName]).fillna(0)
df[maxName] = (df[maxName]/df[maxName]).fillna(0)
df["{}dayExtremum".format(str(n))] = df[minName]-df[maxName]
#df.rename(columns={"minimum": "{}dayMinimum".format(str(n)),
# "maximum": "{}dayMaximum".format(str(n)),
# "extremum": "{}dayExtremum".format(str(n))}, inplace=True)
return df
for days in [5, 20, 60]:
stockHist_comp = mark_localExtrema(stockHist_comp, "close_adj", days, plot=True)
# create indicator that has value 1 if the next n day(s) after day x, the target value is higher/equal than on day x and -1 if not
# (similar to extrema but not the same, since extrema consider higher/lower values on both sides, and this weak trend indicator only the "right/future"-side)
def create_nDayTrendWeakIndicator(df, target, n, plot=False):
"""
input:
df (DataFrame): data frame containing the target value
target (str): name of target value
n (int): number days that the target value has to be higher or lower after
plot (bool): output will be a plot if True
output:
df (DataFrame): data frame containing the input data frame supplemented with the new indicator(s)
"""
df["temp0"] = 1
for days in range(1,n+1):
df["temp"] = (df[target] - df[target].shift(days)) / -abs(df[target] - df[target].shift(days))
df["temp0"] = df["temp0"] + df["temp"]
df["{}dayTrendWeak".format(str(days))] = df["temp0"].apply(lambda x : 1 if (x == n+1) else (-1 if (x == -n+1) else 0))
df.drop(columns=["temp0", "temp"], inplace=True)
if plot:
plot_data = [
go.Scatter(
x=df.index,
y=df[target],
marker_color="blue",
name='target values'
),
go.Scatter(
x=df.index,
y=df["{}dayTrendWeak".format(str(days))].replace(0, np.nan).replace(-1, np.nan)*df[target],
marker_color="green",
mode="markers",
opacity=.6,
marker_size=10,
name='weak positive trend'
),
go.Scatter(
x=df.index,
y=-df["{}dayTrendWeak".format(str(days))].replace(0, np.nan).replace(1, np.nan)*df[target],
marker_color="red",
mode="markers",
opacity=.6,
marker_size=10,
name='weak negative trend'
)
]
plot_layout = go.Layout(title='Trend Indicator ({}-day-weak)'.format(n), template="plotly_white")
fig = go.Figure(data=plot_data, layout=plot_layout)
fig.show()
fig.write_html("data/results/reports/trend_weak_plot.html")
return df
stockHist_comp = create_nDayTrendWeakIndicator(stockHist_comp, "close_adj", n=5, plot=True)
# create indicator that has value 1 if the next n day(s) after day x, the target value increases EVERY day without pullback, -1 if it decreases EVERY day and 0 if not
def create_nDayTrendStrongIndicator(df, target, n, plot=False):
"""
input:
df (DataFrame): data frame containing the target value
target (str): name of target value
n (int): number days that the target value has to be higher or lower continuesly respectively to each day before
plot (bool): output will be a plot if True
output:
df (DataFrame): data frame containing the input data frame supplemented with the new indicator(s)
"""
df["temp0"] = 1
for days in range(1,n+1):
df["temp"] = (df[target].shift(days-1) - df[target].shift(days)) / -abs(df[target].shift(days-1) - df[target].shift(days))
df["temp0"] = df["temp0"] + df["temp"]
df["{}dayTrendStrong".format(str(days))] = df["temp0"].apply(lambda x : 1 if (x == n+1) else (-1 if (x == -n+1) else 0))
df.drop(columns=["temp0", "temp"], inplace=True)
if plot:
plot_data = [
go.Scatter(
x=df.index,
y=df[target],
marker_color="blue",
name='target values'
),
go.Scatter(
x=df.index,
y=df["{}dayTrendStrong".format(str(days))].replace(0, np.nan).replace(-1, np.nan)*df[target],
marker_color="green",
mode="markers",
opacity=.6,
marker_size=10,
name='strong positive trend'
),
go.Scatter(
x=df.index,
y=-df["{}dayTrendStrong".format(str(days))].replace(0, np.nan).replace(1, np.nan)*df[target],
marker_color="red",
mode="markers",
opacity=.6,
marker_size=10,
name='strong negative trend'
)
]
plot_layout = go.Layout(title='Trend Indicator ({}-day-strong)'.format(n), template="plotly_white")
fig = go.Figure(data=plot_data, layout=plot_layout)
fig.show()
fig.write_html("data/results/reports/trend_strong_plot.html")
return df
stockHist_comp = create_nDayTrendStrongIndicator(stockHist_comp, "close_adj", n=5, plot=True)
# calculate the return with close_adj after n future timesteps including the return in %
def calc_futureReturn(df, n):
"""
input:
df (DataFrame): data frame containing the close_adj values for the return calculation
n (int): number of days, the return shall be calculated with
output:
df (DataFrame): input dataframe supplemented with the new indicators/features
"""
df["{}dayReturn".format(str(n))] = df.close_adj.shift(n) - df.close_adj
df["{}dayReturn_perc".format(str(n))] = 1 - (df.close_adj.shift(n) / df.close_adj)
return df
for days in [1, 2, 3, 4, 5]:
stockHist_comp = calc_futureReturn(stockHist_comp, days)
fig = go.Figure()
fig.update_layout(title="Return_perc after the following day", yaxis_title="Return [%]", template="plotly_white", showlegend=True)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp["1dayReturn_perc"], name="1dayReturn_perc"))
fig.add_hline(y=0)
fig.show()
fig.write_html("data/results/reports/1dayReturn_perc_plot.html")
# add future n close_adj values to each timestep
def add_futureValues(df, n):
"""
input:
df (DataFrame): data frame containing the close_adj values
n (int): function will add a column where each row shows the close_adj value of the n'th day after the actual close_adj value
output:
df (DataFrame): input dataframe supplemented with the new indicators/features
"""
df["close_adj_in{}days".format(str(n))] = df.close_adj.shift(n)
return df
for days in [1, 2, 3, 4, 5]:
stockHist_comp = add_futureValues(stockHist_comp, days)
bbcross = np.empty(len(stockHist_comp))
for i in range(0, len(stockHist_comp)-1):
if (stockHist_comp.BBperc.values[i] < 1 and stockHist_comp.BBperc.values[i+1] > 1):
bbcross[i] = -1
elif (stockHist_comp.BBperc.values[i] > 0 and stockHist_comp.BBperc.values[i+1] < 0):
bbcross[i] = 1
else:
bbcross[i] = 0
bbcross[-1] = 0
stockHist_comp["BBcross"] = bbcross
fig = go.Figure()
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj, name="close_adj"))
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj*(stockHist_comp.BBcross.replace(0, np.nan).replace(1, np.nan).abs()), mode="markers", opacity=.6, marker_color="red", marker_size=10, name="Sell"))
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj*stockHist_comp.BBcross.replace(0, np.nan).replace(-1, np.nan), mode="markers", opacity=.6, marker_color="green", marker_size=10, name="Buy"))
fig.update_layout(title="Buy-Sell-Keep-Recommendation by BBperc", template="plotly_white")
Sell Recommendation increasing with:
Buy Recommendation is indicated if the above conditions behave in the opposite direction
# try to transform indicators to a form, where negative/positive values equal a sell/buy recommendation (the higher the absolute value, the stronger)
scaler = MinMaxScaler(feature_range=(-1, 1))
stockHist_comp["diffCloseSMA_norm"] = scaler.fit_transform(stockHist_comp[["diffCloseSMA"]])[:,0]*-1
stockHist_comp["BBperc_norm"] = scaler.fit_transform(stockHist_comp[["BBperc"]])[:,0]*-1
stockHist_comp["RSI_norm"] = scaler.fit_transform(stockHist_comp[["RSI"]])[:,0]*-1
def create_buySellKeepRec(df):
df["buySellKeepRec"] = 0
df["buySellKeepRec"] = df.diffCloseSMA_norm + df.BBperc_norm + df.BBcross + df.RSI_norm
df.buySellKeepRec.fillna(0, inplace=True)
return df
stockHist_comp = create_buySellKeepRec(stockHist_comp)
def SetColor(x):
if(x < -1):
return "red"
elif(-1<= x <=1):
return "white"
elif(x > 1):
return "green"
show_df = stockHist_comp.loc[(stockHist_comp.buySellKeepRec>=2) | (stockHist_comp.buySellKeepRec<=-2.5)]
#fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj, mode="lines", name="close_adj"))
# BB
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, row_width=[0.25, 0.75])
fig.update_layout(title="Buy-Sell-Keep-Recommendation", yaxis_title="Price", template="plotly_white")
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.SMA, name="SMA", line_color="blue"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.BBlow, name="BBlow", line_color="black"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.BBup, name="BBup", line_color="black"), row=1, col=1)
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.close_adj, name="close_adj", line_color="darkred", line_width=.5), row=1, col=1)
fig.add_trace(go.Scatter(x=show_df.index, y=show_df.close_adj, mode="markers", customdata=show_df.buySellKeepRec, name="Buy-Sell-Keep-Recommendation",
marker=dict(size=(show_df.buySellKeepRec.abs()+.2)*8, opacity=.6, color=(list(map(SetColor, show_df.buySellKeepRec)))), hovertemplate="%{customdata:.1f}"), row=1, col=1)
# RSI
fig.add_trace(go.Scatter(x=stockHist_comp.index, y=stockHist_comp.RSI, name="RSI", line_color="orange"), row=2, col=1)
# Add upper/lower bounds
fig.update_yaxes(range=[-10, 110], row=2, col=1)
fig.add_hline(y=0, col=1, row=2, line_color="#666", line_width=2)
fig.add_hline(y=100, col=1, row=2, line_color="#666", line_width=2)
# Add overbought/oversold
fig.add_hline(y=30, col=1, row=2, line_color='#336699', line_width=2, line_dash='dash')
fig.add_hline(y=70, col=1, row=2, line_color='#336699', line_width=2, line_dash='dash')
fig.show()
fig.write_html("data/results/reports/bskr_plot.html")
show_df[["close_adj", "SMA", "diffCloseSMA", "diffCloseSMA_norm", "BBperc", "BBperc_norm", "BBcross", "RSI", "RSI_norm", "buySellKeepRec"]]
| close_adj | SMA | diffCloseSMA | diffCloseSMA_norm | BBperc | BBperc_norm | BBcross | RSI | RSI_norm | buySellKeepRec | |
|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||
| 2021-07-20 | 83.00000 | 88.26400 | -5.26400 | 0.20473 | 0.03897 | 0.58008 | 1.00000 | 31.61677 | 0.43906 | 2.22387 |
| 2021-06-08 | 95.52000 | 87.22669 | 8.29331 | -0.85948 | 0.96277 | -0.48932 | -1.00000 | 80.70546 | -0.78495 | -3.13375 |
| 2020-11-12 | 69.05880 | 62.18078 | 6.87802 | -0.74838 | 0.99291 | -0.52422 | -1.00000 | 75.32179 | -0.65071 | -2.92331 |
| 2020-10-30 | 57.31780 | 61.68116 | -4.36336 | 0.13403 | 0.00299 | 0.62173 | 1.00000 | 28.75890 | 0.51032 | 2.26608 |
| 2020-09-10 | 62.26040 | 58.46166 | 3.79874 | -0.50667 | 0.96205 | -0.48850 | -1.00000 | 80.12004 | -0.77035 | -2.76552 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2010-01-28 | 19.55210 | 20.86867 | -1.31658 | -0.10513 | 0.00558 | 0.61874 | 1.00000 | 27.52415 | 0.54111 | 2.05472 |
| 2007-11-12 | 25.33500 | 27.27301 | -1.93801 | -0.05635 | 0.05382 | 0.56289 | 1.00000 | 28.13383 | 0.52590 | 2.03244 |
| 2007-03-06 | 25.35210 | 26.90670 | -1.55460 | -0.08645 | 0.01398 | 0.60901 | 1.00000 | 27.87207 | 0.53243 | 2.05500 |
| 2006-12-04 | 24.41680 | 25.71762 | -1.30082 | -0.10637 | 0.06492 | 0.55004 | 1.00000 | 25.89777 | 0.58166 | 2.02534 |
| 2006-01-23 | 20.93190 | 21.59232 | -0.66042 | -0.15664 | 0.00132 | 0.62366 | 1.00000 | 23.33989 | 0.64544 | 2.11246 |
73 rows × 10 columns
My Expectation when I started the project was that there must be two or three commonly used Techniques, but the deeper I dive into the field of Timeseries Forecasting and Machine Learning for Trading, the more I understand that I opened Pandora's Box. In purpose of clearing my mind I will try to form a cluster with the main strategies and algorithms that I found (by far not all) to be used for the given task(s).
After the research I decided to start with an Application of an LSTM Recurrent Neural Network which seems to be a popular approach for Timeseries Forecasting.
The Tensorflow Keras Module offers a function called TimeseriesGenerator, that divides past timeseries data into chunks/windows with seperate input arrays and target arrays. In order to understand its parameters I tried to visualize it in the following diagram, hoping it would serve my purpose of prediction. Every color in the plot stands for a single window of the whole timeseries, that will be used during training to predict one step ahead (big black dot). The window length, as well as its starting points can be parametrized.
# divide the whole historic data into batches without scaling for visualization purposes
batch_size=1 # number of windows spread over the whole training or testing data
win_length=500 # timesteps of stock data in a batch
stride=win_length+1 # timesteps between starting points of windows (hopping windows)
df = stockHist_comp.sort_values(by="date", ascending=True)[["close_adj"]].dropna()
df.reset_index(inplace=True)
tsg = TimeseriesGenerator(df[["date", "close_adj"]].to_numpy(), df[["date", "close_adj"]].to_numpy(), length=win_length,
stride=stride, sampling_rate=1, batch_size=batch_size)
print("number of timesteps before timeseriesgeneration: ", len(df))
def timeseriesgenerator_decomposition(tsg):
print("windows in tsg: ", len(tsg))
print("1 input + 1 output array: ", len(tsg[0]))
print("window, training timesteps per window, input-features: ", tsg[0][0].shape)
print("window, output-features (for only one timestep): ", tsg[0][1].shape)
print("input window shape: ", tsg[0][0][0].shape)
print("output window shape: ", tsg[0][1][0].shape)
print("last timestep of the first input window: ", tsg[0][0][0][-1])
print("output value(s) for the first window: ", tsg[0][1][0])
timeseriesgenerator_decomposition(tsg)
# plot windows
fig = go.Figure()
fig.update_layout(title="Timeseries Chunks generated by the TimeSeriesGenerator function",
yaxis_title="Price", template="plotly_white")
fig.add_trace(go.Scatter(x=df.date, y=df.close_adj, name="close_adj"))
for e in range(len(tsg)):
win_inp = pd.DataFrame(tsg[e][0][0])
win_inp.columns = ["date", "close_adj"]
win_inp.index = win_inp.date
win_inp.drop(columns=["date"], inplace=True)
fig.add_trace(go.Scatter(x=win_inp.index, y=win_inp.close_adj, name="element "+str(e)))
fig.add_trace(go.Scatter(mode="markers", x=[tsg[e][1][0][0]], y=[tsg[e][1][0][1]],
marker=dict(symbol="circle", size=15, color="black"), showlegend=False))
fig.show()
fig.write_html("data/results/reports/timeseriesgenerator_plot.html")
# usually "stride" doesn't need to be set to window length, but it helps for the visualization
number of timesteps before timeseriesgeneration: 4306
windows in tsg: 8
1 input + 1 output array: 2
window, training timesteps per window, input-features: (1, 500, 2)
window, output-features (for only one timestep): (1, 2)
input window shape: (500, 2)
output window shape: (2,)
last timestep of the first input window: [Timestamp('2006-12-11 00:00:00') 25.2109]
output value(s) for the first window: [Timestamp('2006-12-12 00:00:00') 25.1991]
Unfortunately, after having spent a lot of time understanding the function and structure of keras' TimeseriesGenerator, I found out that it supposedly only works for single future step prediction and not for multi-step & multi-output predictions (like several feature values for several days in the future). In these cases I would have to write my own functions to create timeserieswindows. However for the first edition of this project I will stay with a one-timestep-prediction of the close_adj value and use the timeseriesgenerator as described.
The following code will split the historic stock data including some indicators in a train and test data set. For both data sets the TimeseriesGenerator creates a bunch of windows for model training and validation. I will use 8 features as input for a timespan of 20 workdays (approx. a month) to predict the close_adj value of the next day.
# df needs to be sorted from old to new data! Target value has to be first feature in df!
df = stockHist_comp.sort_values(by="date", ascending=True)[["close_adj", "SMA"]] # , "EMA", "RSI", "volume", "diffCloseSMA", "BBperc", "1dayReturn_perc", "buySellKeepRec"
eval_df = pd.DataFrame()
eval_plots = []
# create a list of configs to try (gridsearch)
def model_configs():
# define scope of configs
test_size = [.1, .2]
batch_size = [32]
win_length = [10]
epochs = [20]
patience = [8]
learning_rate = [0.001]
metrics = [['mse', 'mae', 'mape', 'msle', 'logcosh']] # tf.metrics.MeanAbsoluteError()
loss = ['mse'] # tf.losses.MeanSquaredError()
# create configs
configs = list()
for i in test_size:
for j in batch_size:
for k in win_length:
for l in epochs:
for m in patience:
for n in learning_rate:
for o in metrics:
for p in loss:
cfg = [i, j, k, l, m, n, o, p]
configs.append(cfg)
print('Total configs: %d' % len(configs))
return configs
configs = model_configs()
#print("All configs: ")
#for i, cfg in enumerate(configs):
# print("\nConfig {}:".format(i))
# for z in cfg: print(z)
start = time.time()
for i, cfg in enumerate(tqdm(configs)):
test_size, batch_size, win_length, epochs, patience, learning_rate, metrics, loss = cfg
#print("\nCurrent config: Nr. {} {}".format(i, cfg))
# scale according to input value range -> if there are negative values, the data should be normalized between -1 and 1, else between 0 and 1
# normalization should be executed for each timewindow, else the model is trained with lower values for stocks that continuesly gain value (e.g. due to inflation etc.)
scaler = MinMaxScaler() # maybe use standardscaler for prices, since the future min and max values of price are unknown for now?
data_scaled = scaler.fit_transform(df.dropna())
input_data = data_scaled[:, :]
target = data_scaled[:, 0]
X_train, X_test, Y_train, Y_test = train_test_split(input_data, target, test_size=test_size, shuffle=False)
train_generator = TimeseriesGenerator(X_train, Y_train, length = win_length, sampling_rate = 1, batch_size = batch_size) # batch_size = number of generated timeseries are created and used?!
test_generator = TimeseriesGenerator(X_test, Y_test, length = win_length, sampling_rate = 1, batch_size = 1)
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(8, input_shape = (win_length, input_data.shape[1]), return_sequences=True))
model.add(tf.keras.layers.Dense(1))
#model.add(tf.keras.layers.LSTM(16, activation='relu', return_sequences=True))
#model.add(tf.keras.layers.Dense(72))
# model.summary()
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_loss",
patience=patience,
mode="min" # if the val_loss doesn't change for n=patience iterations, stop.
)
# what does keras.callbacks.ModelCheckpoit do?
model.compile(loss = loss,
optimizer = tf.optimizers.Adam(learning_rate=learning_rate), # possible variations: SGD, RMSprop, Adam, Adadelta, Adagrad, Adamax, Nadam, Ftrl
metrics = metrics
)
history = model.fit(train_generator,
epochs = epochs,
validation_data = test_generator,
shuffle = False,
callbacks = [early_stopping],
verbose = 0
)
def visualize_loss(history, i):
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(len(loss))
fig, ax = plt.subplots()
ax.plot(epochs, loss, "b", label="Training loss")
ax.plot(epochs, val_loss, "r", label="Validation loss")
ax.set_title("Config {}: Training and Validation Loss".format(i))
ax.set_xlabel("Epochs")
ax.set_ylabel("Loss")
ax.legend()
# plt.show()
plt.close()
return fig, ax
eval_plots.append(visualize_loss(history, i))
eval_df = eval_df.append(pd.DataFrame(history.history).iloc[-1])
#print("\nEvaluation Metrics on Config No. {}: ".format(i))
#print(pd.DataFrame(history.history)[["loss", "val_loss"]].iloc[-1])
# evaluation results as data frame
eval_df.reset_index(inplace=True, drop=True)
config_df = pd.DataFrame(configs, columns=["test_size", "batch_size", "win_length", "epochs", "patience", "learning_rate", "metrics", "loss"])
eval_df = pd.concat([eval_df, config_df], axis=1)
eval_df
end = time.time()
print("Gridsearch Processing Time: {:.0f}s".format(end-start))
Total configs: 2
0%| | 0/2 [00:00<?, ?it/s]
Gridsearch Processing Time: 52s
df = eval_df
for metric in ["mse", "mae", "mape", "msle", "logcosh"]:
# print("config with minimum for {} + val_{}".format(metric, metric))
df[metric+"_sum"] = df[metric]+df["val_"+metric]
eval_df[metric+"_sum"] = df[metric]+df["val_"+metric]
df[[metric, "val_"+metric, metric+"_sum"]].plot.bar(figsize=(10,1), legend=False, ylabel=metric)
for metric in ["mse", "mae", "mape", "msle", "logcosh"]:
display(df.loc[df[metric+"_sum"] == df[metric+"_sum"].min()][["mse_sum", "mae_sum", "mape_sum", "msle_sum", "logcosh_sum", "test_size", "batch_size", "win_length", "epochs", "patience", "learning_rate"]])
| mse_sum | mae_sum | mape_sum | msle_sum | logcosh_sum | test_size | batch_size | win_length | epochs | patience | learning_rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.00522 | 0.07711 | 10204.30684 | 0.00211 | 0.00260 | 0.20000 | 32 | 10 | 20 | 8 | 0.00100 |
| mse_sum | mae_sum | mape_sum | msle_sum | logcosh_sum | test_size | batch_size | win_length | epochs | patience | learning_rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.00522 | 0.07711 | 10204.30684 | 0.00211 | 0.00260 | 0.20000 | 32 | 10 | 20 | 8 | 0.00100 |
| mse_sum | mae_sum | mape_sum | msle_sum | logcosh_sum | test_size | batch_size | win_length | epochs | patience | learning_rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00700 | 0.08968 | 9096.55229 | 0.00266 | 0.00349 | 0.10000 | 32 | 10 | 20 | 8 | 0.00100 |
| mse_sum | mae_sum | mape_sum | msle_sum | logcosh_sum | test_size | batch_size | win_length | epochs | patience | learning_rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.00522 | 0.07711 | 10204.30684 | 0.00211 | 0.00260 | 0.20000 | 32 | 10 | 20 | 8 | 0.00100 |
| mse_sum | mae_sum | mape_sum | msle_sum | logcosh_sum | test_size | batch_size | win_length | epochs | patience | learning_rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.00522 | 0.07711 | 10204.30684 | 0.00211 | 0.00260 | 0.20000 | 32 | 10 | 20 | 8 | 0.00100 |
df = eval_df[["mse_sum", "mae_sum", "mape_sum", "msle_sum", "logcosh_sum", "test_size", "batch_size", "win_length", "epochs", "patience", "learning_rate"]]
display(df)
eval_df.to_csv("data/gridsearch/eval_df.csv")
| mse_sum | mae_sum | mape_sum | msle_sum | logcosh_sum | test_size | batch_size | win_length | epochs | patience | learning_rate | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.00700 | 0.08968 | 9096.55229 | 0.00266 | 0.00349 | 0.10000 | 32 | 10 | 20 | 8 | 0.00100 |
| 1 | 0.00522 | 0.07711 | 10204.30684 | 0.00211 | 0.00260 | 0.20000 | 32 | 10 | 20 | 8 | 0.00100 |
# visualize train and validation losses
for i, plot in enumerate(eval_plots):
display(eval_plots[i][0])
eval_plots[i][0].savefig("data/results/plots/config_{}.png".format(i))
Parameters/Optimizers to vary:
Other Keywords:
""" following gridsearch attempt doesnt work, since the sequential lstm model has no method get_params?!
cv = 3
# define the grid search parameters
batch_size = [1, 10]
epochs = [5, 10]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring="max_error", n_jobs=-1, cv=cv) # cv=k-fold cross validation? n-jobs?
# By setting the n_jobs argument in the GridSearchCV constructor to -1, the process will use all cores on your machine.
# Depending on your Keras backend, this may interfere with the main neural network training process.
grid_result = grid.fit(train_generator)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
"""
' following gridsearch attempt doesnt work, since the sequential lstm model has no method get_params?!\n\ncv = 3\n# define the grid search parameters\nbatch_size = [1, 10]\nepochs = [5, 10]\nparam_grid = dict(batch_size=batch_size, epochs=epochs)\ngrid = GridSearchCV(estimator=model, param_grid=param_grid, scoring="max_error", n_jobs=-1, cv=cv) # cv=k-fold cross validation? n-jobs?\n# By setting the n_jobs argument in the GridSearchCV constructor to -1, the process will use all cores on your machine. \n# Depending on your Keras backend, this may interfere with the main neural network training process.\ngrid_result = grid.fit(train_generator)\n# summarize results\nprint("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))\nmeans = grid_result.cv_results_[\'mean_test_score\']\nstds = grid_result.cv_results_[\'std_test_score\']\nparams = grid_result.cv_results_[\'params\']\nfor mean, stdev, param in zip(means, stds, params):\n print("%f (%f) with: %r" % (mean, stdev, param))\n'
I wrote a function to save the "latest" model. It will overwrite the last model but also create a copy in a backup folder.
# function to save keras models including backups
def save_model(model, name):
timestamp = str(datetime.datetime.now().strftime("%Y%m%d_%H%M"))
model.save("data/models/{}/current/".format(name)) # overwrite recent model
model.save("data/models/{}/backup/{}/".format(name, timestamp)) # create backup with timestamp
# save model
# save_model(model, "model")
In case that the training doesn't work as planned, or time shall be saved, the latest model can be loaded with the following command.
# load model
# model = tf.keras.models.load_model("data/models/model/current/")
In order to visualize what the model predicts, I wrote some code that fits the model on a random timeseries window of the test data set and calculates an error for the difference between the predicted and the actual future close_adj price. This code snippet can be manually repeated for a better understanding. In a future edition of this project, I will use the code in a loop to calculate and compare the errors of different model parametrizations.
timeseriesgenerator_decomposition(test_generator)
windows in tsg: 848 1 input + 1 output array: 2 window, training timesteps per window, input-features: (1, 10, 2) window, output-features (for only one timestep): (1,) input window shape: (10, 2) output window shape: () last timestep of the first input window: [0.74818182 0.64094535] output value(s) for the first window: 0.7354548827239376
test_predictions = model.predict(test_generator)
test_predictions.shape
(848, 10, 1)
I don't understand why the shape of the prediction on the test_generator delivers 180 values per timeseries window! (see chapter "Questions to the Reviewer")
error_list = []
x = 500 # number of test predictions
for i in range(x):
# evaluate prediction of a random timeserieswindow (scaler must match format! - not implemented yet)
window_nr = random.randrange(len(test_generator))
hist = test_generator[window_nr][0][0]
fut = test_generator[window_nr][1][0]
pred = test_predictions[window_nr][-1]
empty = np.empty((1,input_data.shape[1]-1))
empty[:] = np.NaN
fut_resh = np.append(fut, empty).reshape(1, -1)
pred_resh = np.append(pred, empty).reshape(1, -1)
window = np.append(hist, fut_resh, axis=0)
window = np.append(hist, pred_resh, axis=0)
window = pd.DataFrame(scaler.inverse_transform(window))
# calculate error for close_adj as target
thresh = 0.05
plaus_min = window.iloc[-3, 0] + stockHist_comp["1dayReturn_perc"].min()*window.iloc[-3, 0]*(1+thresh)
plaus_max = window.iloc[-3, 0] + stockHist_comp["1dayReturn_perc"].max()*window.iloc[-3, 0]*(1+thresh)
# normalize fut and pred
fut_norm = (window.iloc[-2, 0] - plaus_min) / (plaus_max - plaus_min)
pred_norm = (window.iloc[-1, 0] - plaus_min) / (plaus_max - plaus_min)
error = abs((pred_norm-fut_norm)*100)
"""
# calculate error for 1dayReturn_perc as target
plaus_min = stockHist_comp["1dayReturn_perc"].min()*(1+thresh)
plaus_max = stockHist_comp["1dayReturn_perc"].max()*(1+thresh)
# normalize fut and pred
fut_norm = (window.iloc[-2, 0] - plaus_min) / (plaus_max - plaus_min)
pred_norm = (window.iloc[-1, 0] - plaus_min) / (plaus_max - plaus_min)
error = (pred_norm-fut_norm)*100
# double error if prediction has the opposite sign of the true value
if window.iloc[-2, 0]*window.iloc[-1, 0] < 0:
error = error*3
"""
error_list.append(error)
print("Mean Error for {} predictions: {:.2f}%".format(x, sum(error_list)/len(error_list)))
fig = go.Figure()
fig.update_layout(title="Random test data prediction compared to actual value",
yaxis_title="Price", xaxis_title="Timesteps", template="plotly_white")
# history values
fig.add_trace(go.Scatter(x=window.index[:-2], y=window.iloc[:-2, 0], name="History Data"))
# true value
fig.add_trace(go.Scatter(x=[window.index[-2]], y=[window.iloc[-2, 0]], name="Real Future Value", mode="markers",
marker=dict(symbol="circle-open-dot", color="green", size=15, opacity=.6, line=dict(width=2))))
# predicted value
fig.add_trace(go.Scatter(x=[window.index[-2]], y=[window.iloc[-1, 0]], name="Predicted Future Value", mode="markers+text",
marker=dict(symbol="y-up", color="red", size=15, line=dict(color="red", width=3)), text=["Error: {:.0f}%".format(error)], textposition="top left"))
# plaus_min & plaus_max
fig.add_hrect(y0=plaus_min, y1=plaus_max, line_width=1, fillcolor="lightblue", opacity=0.2, name="plausible value range")
fig.add_hline(y=plaus_min, line=dict(color="orange"), annotation_text="1dayReturn Minimum - 5%", annotation_position="bottom", name="plaus_min")
fig.add_hline(y=plaus_max, line=dict(color="orange"), annotation_text="1dayReturn Maximum + 5%", annotation_position="top", name="plaus_max")
fig.show()
fig.write_html("data/results/reports/backtest_plot.html")
Mean Error for 500 predictions: 10.88%
stockHist_comp.to_csv("data/results/{}_stockHist_comp.csv".format(symbol))
I created my own homepage with the help of a free Bootstrap Template and published it on the web server of my personal Synology NAS that unfortunately does not provide an easy way to work with the Apache backend server. Maybe in the future I will deploy the web app on a platform like heroku with flask, but for now it is more convenient to execute the stock analysis seperately and only publish its results on my homepage:
Structure without Backend:
Structure for Backend Part (not implemented yet - difficulties with apache server configuration on personal NAS)
def plot_candlestick(df, name, window_size, save=False):
INCREASING_COLOR = '#17BECF'
DECREASING_COLOR = '#7F7F7F'
# initial candlestick chart
data = [ dict(
type = 'candlestick',
open = df.open,
high = df.high,
low = df.low,
close = df.close,
x = df.index,
yaxis = 'y2',
name = name,
increasing = dict( line = dict( color = INCREASING_COLOR ) ),
decreasing = dict( line = dict( color = DECREASING_COLOR ) ),
) ]
layout=dict()
fig = dict( data=data, layout=layout )
# create the layout object
fig['layout'] = dict()
fig['layout']['plot_bgcolor'] = 'rgb(250, 250, 250)'
fig['layout']['xaxis'] = dict( rangeselector = dict( visible = True ), rangeslider = dict( visible = False) )
fig['layout']['yaxis'] = dict( domain = [0, 0.2], showticklabels = False, autorange = True, fixedrange=False )
fig['layout']['yaxis2'] = dict( domain = [0.2, 0.8], autorange = True, fixedrange=False )
fig['layout']['legend'] = dict( orientation = 'h', y=0.9, x=0.3, yanchor='bottom' )
fig['layout']['margin'] = dict( t=40, b=40, r=40, l=40 )
# add range buttons
rangeselector=dict(
visible = True,
x = 0, y = 0.9,
bgcolor = 'rgba(150, 200, 250, 0.4)',
font = dict( size = 13 ),
buttons=list([
dict(count=1,
label='reset',
step='all'),
dict(count=1,
label='1yr',
step='year',
stepmode='backward'),
dict(count=3,
label='3 mo',
step='month',
stepmode='backward'),
dict(count=1,
label='1 mo',
step='month',
stepmode='backward'),
dict(count=7,
label='1 w',
step='day',
stepmode='backward'),
dict(count=1,
label='1 d',
step='day',
stepmode='backward'),
dict(step='all')
]))
fig['layout']['xaxis']['rangeselector'] = rangeselector
# set volume bar chart colors
colors = []
for i in range(len(df.close)):
if i != 0:
if df.close[i] > df.close[i-1]:
colors.append(INCREASING_COLOR)
else:
colors.append(DECREASING_COLOR)
else:
colors.append(DECREASING_COLOR)
# calculate bollinger bands for close values
df[["BBlow", "BBmid", "BBup", "BBwidth", "BBperc"]] = pandas_ta.bbands(close=df.sort_values(by="date", ascending=True)["close"], length=20)
# calculate SMA and EMA of close
df['SMA'] = df.sort_values(by="date", ascending=True)["close"].rolling(window=20).mean()
df['EMA'] = df.sort_values(by="date", ascending=True)["close"].ewm(span=20).mean()
# add volume bar chart
fig['data'].append( dict( x=df.index, y=df.volume,
marker=dict( color=colors ),
type='bar', yaxis='y', name='Volume' ) )
fig['data'].append( dict( x=df.index, y=df.BBup, type='scatter', yaxis='y2',
line = dict( width = 1 ),
marker=dict(color='#ccc'), hoverinfo='none',
legendgroup='Bollinger Bands', name='Bollinger Bands') )
fig['data'].append( dict( x=df.index, y=df.BBlow, type='scatter', yaxis='y2',
line = dict( width = 1 ),
marker=dict(color='#ccc'), hoverinfo='none',
legendgroup='Bollinger Bands', showlegend=False ) )
fig['data'].append( dict( x=df.index, y=df.close, type='scatter', yaxis='y2',
line = dict( width = 2 ),
marker=dict(color='black'), hoverinfo='none',
legendgroup='Close', showlegend=True, name="Close" ) )
fig['data'].append( dict( x=df.index, y=df.SMA, type='scatter', yaxis='y2',
line = dict( width = 1 ),
marker=dict(color='blue'), hoverinfo='none',
legendgroup='SMA', showlegend=True, name="SMA" ) )
fig['data'].append( dict( x=df.index, y=df.EMA, type='scatter', yaxis='y2',
line = dict( width = 1 ),
marker=dict(color='violet'), hoverinfo='none',
legendgroup='EMA', showlegend=True, name="EMA" ) )
# plot
iplot( fig, filename = 'Plotly Finance Chart', validate = False )
# save figure as html
if save:
fig = go.Figure(fig)
fig.write_html("data/results/reports/finance_chart.html")
plot_candlestick(df=stockHist_comp, name=symbol, window_size=20, save=True)
display(stockHist_comp[["close_adj", "volume", "SMA", "EMA", "diffCloseSMA", "BBperc", "RSI", "buySellKeepRec"]].iloc[0:1])
last_day_table = stockHist_comp[["close_adj", "volume", "SMA", "EMA", "diffCloseSMA", "BBperc", "RSI", "buySellKeepRec"]].iloc[0:1]
last_day_table = last_day_table\
.to_html()\
.replace('<table border="1" class="dataframe">','<table class="table table-striped">') # use bootstrap styling
| close_adj | volume | SMA | EMA | diffCloseSMA | BBperc | RSI | buySellKeepRec | |
|---|---|---|---|---|---|---|---|---|
| date | ||||||||
| 2021-12-16 | 89.64000 | 1364574.00000 | 89.54600 | 89.39577 | 0.09400 | 0.50882 | 51.30975 | -0.23166 |
html_string = '''
<html>
<head>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
<style>body{ margin:0 100; background:white; }</style>
</head>
<body>
<h1>Stock Analysis of "''' + symbol + '''"</h1>
<!-- *** Section 1 *** --->
<h2>Quick Data Overview</h2>
<h3>Finance Chart</h3>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="finance_chart.html"></iframe>
<p></p>
<h3>Latest available data:</h3>
''' + last_day_table + '''
<p></p>
<h3>Close Prices, Dividends and Splits</h3>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="close_plot.html"></iframe>
<p></p>
<!-- *** Section 2 *** --->
<h2>Technical Analysis</h2>
<h3>Bollinger Bands, Moving Averages and RSI</h3>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="bb_rsi_plot.html"></iframe>
<p></p>
<h3>Distance to the mean</h3>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="diffCloseSMA_plot.html"></iframe>
<p></p>
<h3>1-Day Returns in [%]</h3>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="1dayReturn_perc_plot.html"></iframe>
<p></p>
<h3>Seasonal Decomposition</h3>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="seasonal_decomposition_plot.html"></iframe>
<p></p>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="seasonal_plot.html"></iframe>
<p></p>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="residual_plot.html"></iframe>
<p></p>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="seasonal_lag_plot.html"></iframe>
<p></p>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="seasonal_predict_plot.html"></iframe>
<p></p>
<h3>Trends</h3>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="trend_weak_plot.html"></iframe>
<p></p>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="trend_strong_plot.html"></iframe>
<p></p>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="bskr_plot.html"></iframe>
<p></p>
<!-- *** Section 3 *** --->
<h2>Predictions</h2>
<h3>Adjusted Close Value on the next day</h3>
<p>Disclaimer: The displayed chart is used as place holder and does <u>not</u> predict future stock values in any way!</p>
<iframe width="1000" height="550" frameborder="0" seamless="seamless" scrolling="no" src="backtest_plot.html"></iframe>
<p></p>
</body>
</html>'''
f = open('data/results/reports/report.html','w')
f.write(html_string)
f.close()
I wrote a small function that (with the help of a google API key) sends mails programmatically. It shall be used when finished to inform or alarm me if my server detects a strong buy or sell recommendation (according to my definitions above).
Use Case:
def send_message(subject, body, to):
"""
input:
subject (str): String describing the subject of the message
body (str): String containing the message text
to (str): String containing the mail address
output:
E-Mail
"""
with open('api_keys/google.txt') as f:
google_app_password = f.readlines()[0]
msg = EmailMessage()
msg.set_content(body)
msg["subject"] = subject
msg["to"] = to
user = "thomaskallnik@gmail.com"
msg["from"] = user
password = google_app_password # generated google app password
server = smtplib.SMTP("smtp.gmail.com", 587)
server.starttls()
server.login(user, password)
server.send_message(msg)
server.quit()
# send_message("Error", "Why can't i send it as an SMS???", "thomaskallnik@gmail.com")
Even though my work on the project shall not end yet, this will be it for the first blog post. There is still a lot to improve and not all goals have been achived, but here is what I learned to each requirement from the outline in the beginning:
During the project I continuesly developed more ideas about what I could do in the future to improve the functionalities and accuracy of the data. Here are some of them:

Thanks for the review in advance!